AudioLDM2模型复现前向推理 #366

NKNaN · 2023-12-29T07:10:08Z

任务：#250

text-to-audio推理已跑通
text-to-speech还需转换参数并调试一下结果

paddle-bot · 2023-12-29T07:10:14Z

Thanks for your contribution!

luyao-cv · 2024-01-02T03:37:56Z

文件数量较多，有些代码文件和套件重复。可直接import。例如gpt2, latent_encoder, unet文件夹等

LokeZhou · 2024-01-02T06:35:34Z

paddlemix/models/audioldm2/clap_module/clap_amodel_configs/HTSAT-base.json

这种config的文件，无需上传，统一配置成from_pretrained()这样的接口，

这种config的文件，无需上传，统一配置成from_pretrained()这样的接口，

HTSAT 的 config 现在改成固定写在代码里面的了，原作者只提供了 HTSAT-base 的对应参数，其他的 config 文件已删除

LokeZhou · 2024-01-02T06:37:18Z

请提供前向推理的对齐结果，可以是结果文件，或输出tensor对齐截图等

NKNaN · 2024-01-03T14:00:22Z

请提供前向推理的对齐结果，可以是结果文件，或输出tensor对齐截图等

inference sample results.zip
这是 text prompt 为 "Musical constellations twinkling in the night sky, forming a cosmic melody. " 时不同 seed 生成的结果文件

NKNaN · 2024-01-12T08:25:32Z

文件数量较多，有些代码文件和套件重复。可直接import。例如gpt2, latent_encoder, unet文件夹等

因为原作的这些模型跟套件已有的模型结构和推理过程有一些区别，比如这里面的roberta-base 和 gpt2
differences.xlsx

我尽可能简化一下吧

NKNaN · 2024-01-12T08:28:41Z

转换的参数文件 (model_state.pdparams) 和 config 文件：https://aistudio.baidu.com/datasetdetail/252967
model_state.pdparams 里面去掉了原作里的 litema 模块对应的参数

LokeZhou · 2024-01-15T08:05:14Z

文件数量较多，有些代码文件和套件重复。可直接import。例如gpt2, latent_encoder, unet文件夹等

因为原作的这些模型跟套件已有的模型结构和推理过程有一些区别，比如这里面的roberta-base 和 gpt2 differences.xlsx

我尽可能简化一下吧

如果模型结构没差别，推理过程有差异，可以只重写forward，参考https://github.com/PaddlePaddle/PaddleMIX/blob/develop/paddlemix/models/qwen_vl/modeling.py#L101

luyao-cv · 2024-01-15T11:27:48Z

文件数量较多，有些代码文件和套件重复。可直接import。例如gpt2, latent_encoder, unet文件夹等

因为原作的这些模型跟套件已有的模型结构和推理过程有一些区别，比如这里面的roberta-base 和 gpt2 differences.xlsx

我尽可能简化一下吧

网络定义的名字不完全对齐。建议和套件已有的模型对齐。如果有和已有模型不一样的结构，需重写forward函数

NKNaN · 2024-01-15T15:10:01Z

如果模型结构没差别，推理过程有差异，可以只重写forward，参考https://github.com/PaddlePaddle/PaddleMIX/blob/develop/paddlemix/models/qwen_vl/modeling.py#L101

网络定义的名字不完全对齐。建议和套件已有的模型对齐。如果有和已有模型不一样的结构，需重写forward函数

好的，我再改一下

LokeZhou · 2024-01-18T02:28:53Z

LGTM，辛苦update到最新的paddlemix，让ci跑过后合入

NKNaN · 2024-01-18T03:03:33Z

LGTM，辛苦update到最新的paddlemix，让ci跑过后合入

好的，autoencoder和unet也需要再改一下，今天应该能改好

NKNaN · 2024-01-18T06:29:25Z

修改后的参数和config: https://aistudio.baidu.com/datasetdetail/257191

LokeZhou · 2024-01-18T08:35:12Z

paddlemix/examples/audioldm2/README.md

+```bash
+python run_predict.py \
+--text "Musical constellations twinkling in the night sky, forming a cosmic melody." \
+--model_name_or_path "/home/aistudio/data/data252967" \


这个不要用具体的路径

LokeZhou · 2024-01-18T08:35:55Z

paddlemix/examples/audioldm2/__init__.py

+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.


这个examples的__init__.py可以去掉

LokeZhou · 2024-01-18T08:38:21Z

paddlemix/models/audioldm2/encoders/clap_encoder.py

+    #                 = sum_i x[i] sinc(pi * orig_freq * ((i - orig_freq) / orig_freq - j / new_freq))
+    #                 = sum_i x[i + orig_freq] sinc(pi * orig_freq * (i / orig_freq - j / new_freq))
+    # so y[j+new_freq] uses the same filter as y[j], but on a shifted version of x by `orig_freq`.
+    # This will explain the F.conv1d after, with a stride of orig_freq.


这些注释可以酌情删掉一些

LokeZhou · 2024-01-18T08:39:19Z

paddlemix/models/audioldm2/encoders/sequence2audiomae_encoder.py

+
+        # self.time_pool = max(self.cond_stage_config["crossattn_audiomae_pooled"]["params"]["time_pooling_factors"])
+        # self.freq_pool = max(self.cond_stage_config["crossattn_audiomae_pooled"]["params"]["freq_pooling_factors"])
+        # self.mae_token_num = int(512/(self.time_pool*self.freq_pool))


可以删掉

LokeZhou · 2024-01-18T08:39:31Z

paddlemix/models/audioldm2/encoders/sequence2audiomae_encoder.py

+            cond_dict = self.get_input(batch)
+
+        # self.model.train()
+        # print("!!!!!!!!!!!!!train")


LokeZhou · 2024-01-18T08:42:11Z

paddlemix/models/audioldm2/requirement.txt

@@ -0,0 +1,5 @@
+librosa
+ppdiffusers


ppdiffusers这个包可以不放在这里，在README.md文档里引导到这里安装就行https://github.com/PaddlePaddle/PaddleMIX/blob/develop/README.md?plain=1#L62

LokeZhou · 2024-01-18T08:45:41Z

重新给了一些小修的comment。另外ci一直未过，是否基于最新的paddlemix提的pr。可重点对比一下这个脚本单测，https://github.com/PaddlePaddle/PaddleMIX/blob/develop/tests/models/test_minigpt4.py#L561 。

NKNaN · 2024-01-18T09:59:51Z

重新给了一些小修的comment。另外ci一直未过，是否基于最新的paddlemix提的pr。可重点对比一下这个脚本单测，https://github.com/PaddlePaddle/PaddleMIX/blob/develop/tests/models/test_minigpt4.py#L561 。

感谢review。刚才git pull --rebase了，本地分支这里应该已经是最新的
https://github.com/NKNaN/PaddleMIX/blob/ayase-develop/tests/models/test_minigpt4.py#L561

ci里面报错的 tests.models.test_minigpt4.MiniGPT4VisionModelTest 的 test_save_load 是调用了父类 ModelTesterMixin 的方法，是不是应该在 MiniGPT4VisionModelTest 类里面把 test_save_load 重写成 pass

LokeZhou · 2024-01-19T03:52:50Z

MiniGPT4VisionModelTest

后面我们再统一查一下，当前ci不过不影响合入

任务：PaddlePaddle#250 - text-to-audio推理已跑通

paddle-bot bot added the contributor label Dec 29, 2023

shiyutang requested a review from JunnYu January 2, 2024 03:14

LokeZhou reviewed Jan 2, 2024

View reviewed changes

luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Jan 15, 2024

luotao1 assigned luotao1 and LokeZhou Jan 15, 2024

NKNaN force-pushed the ayase-develop branch from 02c628d to 2ad8b4c Compare January 18, 2024 06:27

LokeZhou reviewed Jan 18, 2024

View reviewed changes

NKNaN added 4 commits January 18, 2024 17:57

add audioldm2

6055bda

delete unnecessary files

85f9426

delete unnecessary files

3e88a0f

refine model code

1129d74

NKNaN added 4 commits January 18, 2024 17:57

revise roberta and gpt2

b08f2a4

revise autoencoderkl

ecb45ca

refine unet

bc55fba

update code

e170511

NKNaN force-pushed the ayase-develop branch from 9ff9536 to e170511 Compare January 18, 2024 09:58

LokeZhou approved these changes Jan 19, 2024

View reviewed changes

LokeZhou merged commit f049e2d into PaddlePaddle:develop Jan 19, 2024
1 of 3 checks passed

NKNaN mentioned this pull request Jan 19, 2024

[WeeklyReports] 2023.12.25~2024.1.19 周报收集 PFCCLab/Starter#97

Closed

24 tasks

NKNaN mentioned this pull request Jun 11, 2024

WAVE SUMMIT+2024上半年飞桨开源之星评选-信息征集 PaddlePaddle/community#892

Closed

westfish pushed a commit to westfish/PaddleMIX that referenced this pull request Sep 25, 2024

AudioLDM2模型复现前向推理 (PaddlePaddle#366)

5c68cbd

任务：PaddlePaddle#250 - text-to-audio推理已跑通

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AudioLDM2模型复现前向推理 #366

AudioLDM2模型复现前向推理 #366

NKNaN commented Dec 29, 2023 •

edited

Loading

paddle-bot bot commented Dec 29, 2023

luyao-cv commented Jan 2, 2024

LokeZhou Jan 2, 2024

NKNaN Jan 12, 2024 •

edited

Loading

LokeZhou commented Jan 2, 2024

NKNaN commented Jan 3, 2024

NKNaN commented Jan 12, 2024

NKNaN commented Jan 12, 2024

LokeZhou commented Jan 15, 2024

luyao-cv commented Jan 15, 2024 •

edited

Loading

NKNaN commented Jan 15, 2024

LokeZhou commented Jan 18, 2024

NKNaN commented Jan 18, 2024

NKNaN commented Jan 18, 2024

LokeZhou Jan 18, 2024

NKNaN Jan 18, 2024

LokeZhou Jan 18, 2024

NKNaN Jan 18, 2024

LokeZhou Jan 18, 2024

NKNaN Jan 18, 2024

LokeZhou Jan 18, 2024

NKNaN Jan 18, 2024

LokeZhou Jan 18, 2024

NKNaN Jan 18, 2024

LokeZhou Jan 18, 2024

NKNaN Jan 18, 2024

LokeZhou commented Jan 18, 2024

NKNaN commented Jan 18, 2024 •

edited

Loading

LokeZhou commented Jan 19, 2024

		@@ -0,0 +1,5 @@
		librosa
		ppdiffusers

AudioLDM2模型复现前向推理 #366

AudioLDM2模型复现前向推理 #366

Conversation

NKNaN commented Dec 29, 2023 • edited Loading

paddle-bot bot commented Dec 29, 2023

luyao-cv commented Jan 2, 2024

Choose a reason for hiding this comment

NKNaN Jan 12, 2024 • edited Loading

Choose a reason for hiding this comment

LokeZhou commented Jan 2, 2024

NKNaN commented Jan 3, 2024

NKNaN commented Jan 12, 2024

NKNaN commented Jan 12, 2024

LokeZhou commented Jan 15, 2024

luyao-cv commented Jan 15, 2024 • edited Loading

NKNaN commented Jan 15, 2024

LokeZhou commented Jan 18, 2024

NKNaN commented Jan 18, 2024

NKNaN commented Jan 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LokeZhou commented Jan 18, 2024

NKNaN commented Jan 18, 2024 • edited Loading

LokeZhou commented Jan 19, 2024

NKNaN commented Dec 29, 2023 •

edited

Loading

NKNaN Jan 12, 2024 •

edited

Loading

luyao-cv commented Jan 15, 2024 •

edited

Loading

NKNaN commented Jan 18, 2024 •

edited

Loading